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Abstract 

Calculating the expected number of misclassified outcomes is a standard prob- 
lem of particular interest for rare-event searches. The Clopper-Pearson method 
allows calculation of classical confidence intervals on the amount of misclassi- 
fication if data are all drawn from the same binomial probability distribution. 
However, data is often better described by breaking it up into several bins, each 
represented by a different binomial distribution. We describe and provide an 
algorithm for calculating a classical confidence interval on the expected total 
number of misclassified events from several bins, based on calibration data with 
the same probability of misclassihcation on a bin-by-bin basis. Our method 
avoids a computationally intensive multidimensional search by introducing a 
Lagrange multiplier and performing standard root finding. This method has 
only quadratic time complexity as the number of bins, and produces confidence 
intervals that are only slightly conservative. 
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1. Introduction 

Many real-world processes can assume one of two possible outcomes; each 
independent trial or observation can be classified as either "success" or "failure," 
with the probability of success p. All such trials can be bundled together to form 
a single experiment with x successes out of a total of n trials. If the experiments 
are repeated many times, the relative frequency of successes in each experiment 
follows the binomial distribution (e.g. [if). 

For a measurement of x and n, the best estimate P — x/n of the true success 
probability p can be calculated. Since the ratio P/(l — P) is the best estimate 
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of the expected ratio of successes to failures, the best estimate of the number of 
successes Y of a second experiment that has the same success probability and 
a known number of failures b is 

P 

Y= b. (1) 

l-P V ' 

Furthermore, methods such as Clopper-Pearson's @ provide a (frequentist, or 
classical) confidence interval [Pi ow , -Phigh] with probability content f3 such that 
the fraction of experiments with Pi ow < p < Phigh is « /3 (with the lack of 
exact equality due to the discrete nature of binomial distribution; see e.g. [3j 
for comparisons of various methods). By extension, these methods may also be 
used to calculate the confidence interval [Yiow , ^high] on the expected number of 
successes of the second experiment. 

Such estimates may be particularly useful for characterizing backgrounds 
for rare-event searches. A given background event may have some probability 
p to be misclassified as a signal event. A first, "calibration" experiment may 
allow estimation of p based on the number of events n and the number x mis- 
classified as signal (the "leakage" ) . A second, "search" experiment may provide 
a measurement of the number of correctly identified background events b. If 
background events in both experiments have the same probability of correct 
classification, the expected number of misclassified events Y and a confidence 
interval [Yj ow , ^high] on the expected number may be determined. 

Often, however, in order for events in the calibration and search both to 
have the same probability of misclassification p, events with different character- 
istics (e.g. energy, position, detector, or pixel) must be considered separately, 
resulting in m separate bins of events for both calibration and search. In the i th 
calibration bin there are Xi misclassified events out of the total ni calibration 
events, resulting in a best estimate Pi = Xi/rii of the misclassification probabil- 
ity for events in that bin. For the search data, the number of correctly classified 
events in the i th bin, is known. If the true misclassification probability, pi, 
of an event in the i th bin is the same for both calibration and search data, the 
best estimate for the total expected number of misclassified events Y is 

m 

^ = EttV^ /(p) ' (2) 



where P = {P l }. 

The likelihood C that Xi events out of the total Hi calibration events in each 
bin are misclassified is simply the product of the binomial probabilities, with 

m 

r^n^a-^r 1 '' ( 3 ) 

i 

The global maximum £ of the likelihood is trivially given by the set P = {Pi} = 
{xi/rii} for all i. Note that it is possible to estimate the expected leakage only 
if no calibration bin has zero total events (i.e. rn ^ for all i). Substitution of 
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the set {Pi} into Equation [2] yields the most likely value of the total expected 
leakage Y = /(P). 

Unfortunately, for the case with multiple bins, most existing methods can- 
not be used to calculate a confidence interval on the total expected leakage. 
Here we describe a method and provide a practical algorithm for this problem. 
We use the "Unified Approach" described by Feldman and Cousins [4[ (see also 
extended to deal with nuisance variables by means of theprofile likeli- 
without the large-sample approximation used in e.g. [l|, @|; here P are 
nuisance variables since they are unknown variables for which we are not setting 
a confidence interval. 

2. Method 

For every considered value of Yq, we calculate the profile likelihood 




C r |n,x,b,P 
A = — h rv > ( 4 ) 



£(l>|n,x,b,P) 



where n = {rii}, x = {xi}, and b = {hi] are the data, and P is the combination 
of Pi, found by a search described in Sect. 12. l[ that maximizes the likelihood for 
the value of Y under test. Asymptotically (and far from physical boundaries), 
the distribution of — 21n(A) is x 2 -distributed with 1 degree of freedom Q, but 
more accurate results may be obtained by determining the expected distribution 
by Monte Carlo simulation. For each simulated experiment, x is randomly 
determined based on the P found above. For each, the best-fit values Pmc and 
Pmc are found, and then the ratio 

C (y |n,x,b,P M c 

Amc = 



C F|n,x,b,P 



MC 

is calculated. If A is larger than 1 — /3 of the simulated Amc ratios, then 
Y"o is included in the confidence interval of probability content /3. Since the 
distributions follow the binomial distribution, the uncertainties oc 1/v^mCj 
the inverse of the square root of the number of experiments. Thus, to achieve 
a relative tolerance t, conduct -/Vmc = t 2 Monte Carlo simulations. A root- 
finding algorithm hunts for the smallest and largest values of Yq that are allowed 
in order to return the desired confidence interval [Y\ ow , Ehigh] • 

2. 1 . Formulation 

A multiparameter function minimizer, such as MINUET [9], could be imple- 
mented to hunt for the combination of probabilities Pi that maximize Equation[3] 
subject to the constraint of Equation [2] However, this method may have ex- 
ponential time complexity in the worst case (lp| . making it unfeasible for the 
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analysis of more than a few bins. Furthermore, there would be some risk of 
missing the global maximum. Instead, the combination of binomial probabili- 
ties that maximize Equation [3] subject to the constraint of Equation [2] can be 
found by introducing a Lagrange multiplier, A, and solving 

^ [\n(C(P)) + A(/(P) - Y )] = , 

where lo is a given constant. A little algebra yields the solution to this equation 
for each bin i: 



n>i +Xi- Xbi ± J (Xbi - TH - Xi) 2 - \n i x l 
Pi = 2^ ' (5) 

while substituting back into Equation [5] yields an equation for the Lagrange 
multiplier: 



, n i + x i - ± V (A&i -rii- Xi) 2 - 4riiXi ^ ^ . . 



- + A6; =f a/ (A&i - ^ - ^i) 2 - 



Equation |6] is really 2 m separate equations, depending on the the signs of 
each ± term. One of the 2 m solutions yields the value of A that gives the 
most likely combination of binomial probabilities (i.e. P) for the desired total 
expected leakage lo- Fortunately, further analysis reveals a significant reduction 
in the number of viable solutions. 

For any bin with bi ^ 0, 



Tti "T" Xi Z\/TliXi 
A > Ci = : 1 



is unphysical, producing imaginary or negative probabilities. Since A must be 
physical for all bins, A is required to be less than or equal to the smallest Cj, 
i.e. X < inf{c;} = A c . For any bin with bi — 0, Cj — > oo so that it places no 
constraint on A. 

Table [1] lists the different limiting values of A and their corresponding values 
of Pi from Equation [5] The lower limit on the confidence interval must have 
Pi < Xijrii for each bin. Therefore, Table Q] indicates that the solution must use 
the negative root for each bin, reducing the problem from searching among 2 m 
solutions to solving a single equation. 

It is easiest to understand the viable solutions for the confidence interval's 
upper bound by first noting that, other than the constraint of Equation [5J each 
term in 

rn 

ln(£)=^ln(£ i ) 

i 

is independent. For each term, hi(Ci) decreases monotonically with increasing 
Pi > Pi, and there is an inflection point at Pi — P,(A = c,), with ln(£i) 
decreasing ever more slowly for larger Pi. 



4 



Table 1: Summarized analysis of the behavior of Equation [5] 





Positive Root 


Negative Root 


A < 


Pi > 1 


< Pi < Xi/rii 


A =0 


Pi=l 


Pi %i 1 ¥1% 


o < A < a 


yjxi/rii < P l < 1 


Xi/m < Pi < y/xi/tli 



For any bins i and j, it can be shown that 



<91n(£) 



dYr 



0( 



31n(£) 



dY ( 



This relation is to be expected. If, instead, the left term were larger (smaller) 
than the right term, a more likely combination with the same total Yq could be 
found by decreasing Y"oi (Xoj) an d increasing the other by the same amount. 

In a similar way, it may be shown that the combination of probabilities P 
that maximize the likelihood for a given total expected leakage Yq never in- 
cludes more than one bin with Pj > Pi (A = Cj), and hence more than one bin 
using the positive root of Equation [2j Figure [1] helps visualize the reasoning. 
Consider any two bins i and j that both use the positive roots and hence have 
Pi > Pi(X = a), Pj > Pj(X = Cj). If dln(C)/dY 0l \ Ym=k > dln(£)/dY 0j \ Yoj = k 
then dln(£)/dYoi\Y oi =k+5 > d\n(£)/dY j\ Yoj =k-s for S = k-Y j(cj), since 
d\n(C)/dY i becomes larger as Y oi is increased, and <91n(£)/ 'dY j becomes 
smaller as Yqj is decreased towards the inflection point. As a result, for a 
given total leakage, the likelihood is never maximized by having two bins use 
the positive roots of Equation [2] Since a multi-bin system can be examined in 
2-bin pieces, it follows that it is necessary to solve Equation [6] only for m + 1 
cases: the case with all negative roots, and also the m cases with one positive 
root and m—1 negative roots. Thus, the total number of equations to be solved 
is reduced from 2 m to m + 1. 

A standard root-finding algorithm may be used to hunt for a physical solution 
to each of these m+1 equations. Our implementation uses the Van Wijngaarden- 
Dekker-Brent Method [llj which combines the secant method, bisection method, 
and inverse quadratic interpolation to find bracketed roots of a given function — 
in this case, appropriate A for a given Yq. 



2.2. Benchmarking 

All tests were conducted with an Intel Core i5 M430 processor and 6 GB of 
RAM. Since each additional bin increases the length of all of the data storage 
vectors by one element, the RAM usage increases linearly with the number of 
bins. The estimated RAM usage is 16.69 ± 0.01 kB per bin. For these tests, 
five measurements of each quantity were taken and their averages were plotted. 
The error bars represent the standard deviations of those means. 
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Figure 1: Visualization to demonstrate that the total likelihood is not maximized if more than 
one bin uses the positive root in EquationfS] Any points to the right of each curve's minimum 
are given by the positive root, while points at or to the left of the minimum are given by 
the negative root. Suppose the desired total expected leakage is Yo = 20, with one possible 
combination, Yn = Y>2 = 10, shown as the triangles. Since c91n(£)/<9Yo; must increase with 
increasing Yk f° r both curves to the right of their minima, a more likely combination also 
resulting in Yo = 20 can be found by decreasing the expected leakage in the bin with the more 
negative value of 91n(£)/<9Yoi to the minimum while increasing the expected leakage in the 
other (squares) . The maximum of the likelihood may be found by continuing to decrease the 
expected leakage in the bin with the more negative value of d\n(C) / dY^i until both bins have 
equal values of dln(C)/dYoi (circles). 

For m bins, Equation [2] consists of m terms that must be calculated for a 
given value of A. To find both the upper and lower limits, m + 1 equations each 
with m terms must be solved. Therefore, finding the limits of the confidence 
interval as a function of the number of bins has time complexity C(m 2 ), as shown 
in Figure [2J The number of Monte Carlo simulations is inversely proportional to 
the square of the desired relative tolerance. Therefore, as tolerance decreases, 
the number of basic operations increases polynomially, as shown in Figure O 

2.3. Coverage Tests 

By construction, the confidence interval includes the true expected leakage 
a fraction j3 of the time if the true leakage probabilities p = P, i.e. the method 
provides correct "coverage" for the most likely combination of probabilities for 
each total expected search leakage Yo- For other values of the true leakage 
probabilities, the coverage is not exact. However, since P is the combination of 
nuisance parameters that maximizes the likelihood for the data, it may be ex- 
pected that the coverage is nearly correct for other combinations of true leakage 
probabilities, and that there is usually slight over-coverage (e.g. a claimed 90% 
confidence interval may contain the true parameters slightly more than 90% of 
the time for some combinations of parameters). 
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Figure 2: Top: Run time as a function of the number of bins m, each with = 100 calibration 
events, Xi = 5 leaking calibration events, and f>i = 10 correctly tagged events from the search 
data, for tolerance t = 0.1. The data fit time complexity 0{m?). Bottom: Run time as a 
function of tolerance t, for m = 5 bins of data identical to that used in the top plot. The data 
fit time complexity 1~ ^K' ^*)), polynomial time. 



To test this algorithm for coverage, several simulated experiments were per- 
formed. For each set of tests, values of n, b, and p were chosen as listed in 
Table [2] or in the caption to Figure [3l The calibration leakage x was simulated 
from the binomial distribution, given n and p, and the program found the most 
likely total expected leakage and corresponding 90% confidence interval. For 
each set of tests, this simulation was repeated 10000 times in order to determine 
what fraction of the intervals contained the true expected leakage. The results, 
shown in Figure |3] and Table [2] suggest that the algorithm produces confidence 
intervals with approximate coverage and slight conservatism, as expected. 



3. Application to Dark Matter Searches 

Many collaborations around the world are attempting to make direct detec- 
tions of dark matter particles (see e.g. 12- 13]). Dark matter is hypothesized 
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Figure 3: The percentage of simulated 90% confidence intervals that include the true expected 
leakage given the probability that an event will leak. Each test uses 3 bins, with ni = 1000, 
bi = 10, and Xi thrown from a binomial distribution given and the true expected leakage. 



Table 2: Total number of calibration events n, expected calibration leakage (x), number of 
correctly tagged search events b, and percentage of the time that the true expected search 
leakage was in the resulting 90% confidence interval, for the various cases described. 



n 


(x) 


b 


% In 


Description 


(10 3 ,10 3 ) 
(10 3 ,10 3 ,10 3 ) 
(10,10 3 ,10 3 ) 
(10 5 ,10 3 ,10 3 ) 
(10 3 ,10 3 ,10 3 ) 


(1,100) 
(500,5,5) 
(5,5,5) 
(5,5,5) 
(100,50,30) 


(1,100) 
(10,10,10) 
(10 4 ,10,10) 
(10,10,10) 
(10,10,10) 


93 ± 1 
90 ± 1 

89 ± 1 

90 ± 1 

91 ± 1 


bin with large {xi), bi 
bin with large (xi) 
bin with rii <C bi 
bin with rii 3> bi 
wide range of (x) 



to constitute the majority of the universe's mass, in an effort to explain many 
astrophysical observations 15]. Since dark matter particles interact very weakly 
with normal matter, dark matter detectors must be very sensitive, which makes 
the rejection of background interactions an important task. 

The CDMS II collaboration recently published results in which two candidate 
dark matter events appeared in their signal region. However, with an expected 
background of 0.9 ± 0.2 events, it was not statistically significant evidence of 
dark matter particle detection [l6j]. The background estimate was calculated 



with a Bayesian technique |17| based on three independent "calibration" data 



sets, two of which are too complicated to be considered here. The third "calibra- 
tion" set (see Table [3} provided a background estimate only for those detectors 
not on the top or bottom of a detector stack; this background estimate totaled 
0.651q'29 even t s (stat.) ±0.13 events systematic uncertainty, with a slight ex- 
pected overcoverage based on simulations. The same data, when analyzed by 
the method described above, give a total expected leakage of 0.54+o'2o events, 
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confirming the expected slight overcoverage in the CDMS II estimate. A visual 
representation of the upper and lower bound solutions are shown in Figure |4j 
Because the other calibration sets result in smaller uncertainties than the third 
calibration set, including our revised estimate of the leakage has a negligible 
effect on the total estimated background. 

Table 3: Data from the final run of the CDMS II experiment. Each detector is taken to be a 
single bin. Two bins (the end-cap detectors T3Z6 and T4Z6) have been excluded. 



Detector Designation 


rii 


X i 


bi 


T1Z2 


28 





15 


T1Z5 


49 





8 


T2Z3 


45 





8 


T2Z5 


G7 


2 


9 


T3Z2 


50 





2 


T3Z4 


48 





7 


T3Z5 


29 





4 


T4Z2 


41 





5 


T4Z4 


31 





6 


T4Z5 


44 


1 


6 


T5Z4 


59 





G 


T5Z5 


49 


1 


6 



4. Discussion 

The method described here solves the problem of calculating confidence in- 
tervals on leakage for several bins. It has quadratic time complexity, so that 
even problems with a tremendous number of bins may be handled. Further- 
more, it produces intervals that are only slightly conservative, but exhibit full 
coverage. 

A more rigorous and detailed discussion of the topics presented in this paper 
has been compiled into a technical note, along with a fully coded version of the 
algorithm [l8j |. 
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Figure 4: Plot of the likelihood d of the expected leakage Yoi in each bin versus the expected 
leakage in each bin for the CDMS Run 125 c58. Of the twelve bins under consideration, only 
the three with Xi > (T2Z5: solid, T4Z5: dashes, and T5Z5: dots) have non-zero expected 
leakage solutions for the lower limit and only one additional bin (T1Z2: dash-dots) has Y"oi > 
for the upper limit. The straight lines are tangent to each bin at the points corresponding 
to the most likely combination of expected leakages, yielding the 68% lower (left) and upper 
(right) limits of the confidence interval. For the upper limit, the overall likelihood of the 
configuration is decreased the least by increasing the expected leakage in the detector with 
the worst statistics — T1Z2. As expected, the tangent lines are all parallel. 
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